Design

1 Column-oriented and Flexible

One of the most distinctive design of cpprb is column-oriented flexibly defined transitions. As far as we know, other replay buffer implementations adopt row-oriented flexible transitions (aka. array of transition class) or column-oriented non-flexible transitions.

In deep reinforcement learning, sampled batch is divided into variables (i.e. obs, act, etc.). If the sampled batch is row-oriented, users (or library) need to convert it into column-oriented one. (See doc, too)

2 Batch Insertion

cpprb can accept addition of multiple transitions simultaneously. This design is convenient when batch transitions are moved from local buffers to a global buffer. Moreover it is more efficient because of not only removing pure-Python for loop but also suppressing unnecessary priority updates for PER. (See doc, too)

3 Minimum Dependency

We try to minimize dependency. Only NumPy is required during its execution. Small dependency is always preferable to avoid dependency hell.